GridFTP Pipelining

نویسندگان

  • John Bresnahan
  • Michael Link
  • Rajkumar Kettimuthu
  • Dan Fraser
  • Ian Foster
چکیده

GridFTP is an exceptionally fast transfer protocol for large volumes of data. Implementations of it are widely deployed and used on well-connected Grid environments such as those of the TeraGrid because of its ability to scale to network speeds. However, when the data is partitioned into many small files instead of few large files, it suffers from lower transfer rates. The latency between the serialized transfer requests of each file directly detracts from the amount of time data pathways are active, thus lowering achieved throughput. Further, when a data pathway is inactive, the TCP window closes, and TCP must go through the slow-start algorithm. The performance penalty can be severe. This situation is known as the “lots of small files” problem. In this paper we introduce a solution to this problem. This solution, called pipelining, allows many transfer requests to be sent to the server before any one completes. Thus, pipelining hides the latency of each transfer request by sending the requests while a data transfer is in progress. We present an implementation and performance study of the pipelining solution.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast Parallel File Replication In Data Grid

Parallel file replication where a large file needs to be simultaneously replicated to multiple sites is an integral part of dataintensive grid environment. Current data transport mechanisms such as GridFTP is mainly created for point-to-point file transfer and not for parallel point-to-multipoint transfer (required in replication). This paper presents a tool that creates multiple distribution t...

متن کامل

A Flexible GridFTP Client For Implementation of Intelligent Cloud Data Scheduling Services

Current Cloud providers strive to provide novel services and tools for management, analysis, access and scheduling of Big Data that is generated in massive amounts by a large variety of sources. These tools and services have to be flexible and scalable enough to be able to manage data in exa-scale with the help of data centers that can hold thousands of compute and storage nodes interconnected ...

متن کامل

Performance Evaluation of Data Transfer Protocol GridFTP for Grid Computing

In Grid computing, a data transfer protocol called GridFTP has been widely used for efficiently transferring a large volume of data. Currently, two versions of GridFTP protocols, GridFTP version 1 (GridFTP v1) and GridFTP version 2 (GridFTP v2), have been proposed in the GGF. GridFTP v2 supports several advanced features such as data streaming, dynamic resource allocation, and checksum transfer...

متن کامل

GridFTP-APT: Automatic Parallelism Tuning Mechanism for GridFTP in Long-Fat Networks

In this paper, we propose an extension to GridFTP that optimizes its performance by dynamically adjusting the number of parallel TCP connections. GridFTP has been used as a data transfer protocol to effectively transfer a large volume of data in Grid computing. GridFTP supports a feature called parallel data transfer that improves throughput by establishing multiple TCP connections in parallel....

متن کامل

A GridFTP Transport Driver for Globus XIO

GridFTP is a high-performance, reliable data transfer protocol optimized for high-bandwidth wide-area networks. Based on the Internet FTP protocol, it defines extensions for highperformance operation and security. The Globus implementation of GridFTP provides a modular and extensible data transfer system architecture suitable for wide area and high-performance environments. GridFTP is the de fa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007